Analytics over Probabilistic Unmerged Duplicates

نویسندگان

Ekaterini Ioannou

Minos N. Garofalakis

چکیده

This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and techniques that would allow performing efficient processing of queries over such probabilistic databases, and especially without the need to materialize the (potentially, huge) collection of all possible deduplication worlds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Data Structures for Web Analytics and Data Mining

متن کامل

Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes

This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...

متن کامل

A Statistical Data Fusion Technique in Virtual Data Integration Environment

Data fusion in the virtual data integration environment starts after detecting and clustering duplicated records from the different integrated data sources. It refers to the process of selecting or fusing attribute values from the clustered duplicates into a single record representing the real world object. In this paper, a statistical technique for data fusion is introduced based on some proba...

متن کامل

Policy Analytics Generation Using Action Probabilistic Logic Programs

Action probabilistic logic programs (ap-programs for short) [15] are a class of the extensively studied family of probabilistic logic programs [14,21,22]. ap-programs have been used extensively to model and reason about the behavior of groups and an application for reasoning about terror groups based on ap-programs has users from over 12 US government entities [10]. ap-programs use a two sorted...

متن کامل

$Υ$-DB: A system for data-driven hypothesis management and analytics

The vision of Υ-DB introduces deterministic scientific hypotheses as a kind of uncertain and probabilistic data, and opens some key technical challenges for enabling data-driven hypothesis management and analytics. The Υ-DB system addresses those challenges throughout a design-by-synthesis pipeline that defines its architecture. It processes hypotheses from their XML-based extraction to encodin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Analytics over Probabilistic Unmerged Duplicates

نویسندگان

چکیده

منابع مشابه

Probabilistic Data Structures for Web Analytics and Data Mining

Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes

A Statistical Data Fusion Technique in Virtual Data Integration Environment

Policy Analytics Generation Using Action Probabilistic Logic Programs

$Υ$-DB: A system for data-driven hypothesis management and analytics

عنوان ژورنال:

اشتراک گذاری